Search CORE

11 research outputs found

Document classification based on library catalogue metadata

Author: Roivainen Hege
Publication venue: Helsingfors universitet
Publication date: 01/01/2017
Field of study

Kansalliskirjastojen metadataluettelot ovat hyviä informaatiolähteitä, sillä ne sisältävät tiedon lähes kaikesta tiettynä aikana ja tietyllä alueella julkaistusta aineistosta. Yleensä ne ovat kattavasti kuvailtuja, joten niitä voi käyttää kvantitatiivisen tutkimuksen lähteinä. Usein tutkimusta tehtäessä tutkimusaineisto kannattaa jakaa pienempiin osiin esimerkiksi genren perusteella. Monissa tapauksissa aineiston aukkoisuus kuitenkin vähentää aineiston käytettävyyttä. Tämä pro gradu -työ arvioi mahdollisuutta hyödyntää koneoppimista etsittäessä tutkimukselle relevantteja osajoukkoja kirjastoluetteloista. Esimerkkitapaukseksi valitsin English Short Title Cataloguen (ESTC) ja etsittäväksi osajoukoksi runokirjat. Runokirjojen genretiedon kuuluisi olla annotoitu, mutta todellisista kirjastoluetteloista tämä tieto usein puuttuu. Käytin random forest -algoritmiä perinteisillä tekijän tunnistuksessa ja genreluokittelussa käytetyillä erityyppisillä piirrevektoreilla sekä metadatakenttien arvoilla parhaan tuloksen saamiseksi. Koska kirjastoluettelot eivät sisällä kirjojen koko tekstiä, piirteiden valinta keskittyi otsikoissa käytettyihin sanoihin ja lingvistisiin ominaisuuksiin. Otsikot ovat yleensä lyhyitä ja sisältävät hyvin vähän informaatiota, minkä vuoksi yhdistin piirrevektoreiden parhaiten toimivat piirteet yhteen ja tein lopullisen haun niillä. Tutkimuksen päätulos oli varmistus siitä, että otsikoiden käyttö piirteiden muodostamisessa on käyttökelpoinen strategia. Tutkimus avaa mahdollisuuksia määrittää osajoukkoja tulevaisuudessa koneoppimisen keinoin ja lisätä kirjastoluetteloiden hyödyntämistä kvantitatiivisessa tutkimuksessa

Helsingin yliopiston digitaalinen arkisto

Becoming a state language : Finnish public debate and modal grammar 1820–1917

Author: Kanner Antti
Marjanen Jani
Roivainen Hege
Tahko Tuuli
Publication venue
Publication date: 22/10/2020
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Scaling up bibliographic data science

Author: Lahti Leo
Marjanen Jani
Roivainen Hege
Tolonen Mikko
Publication venue: CEUR-WS.org
Publication date: 01/01/2019
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Bibliographic Data Science and the History of the Book (c. 1500–1800)

Author: Lahti Leo
Marjanen Jani Pekka
Roivainen Hege Henri Markus
Tolonen Mikko Sakari
Publication venue
Publication date: 01/01/2019
Field of study

National bibliographies have been identified as a crucial resource for historical research on the publishing landscape, but using them requires addressing challenges of data quality, completeness, and interpretation. We call this approach bibliographic data science. In this article, we briefly assess the development of book formats and the vernacularization process in early modern Europe. The work undertaken paves the way for more extensive integration of library catalogs to map the history of the book.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

FigShare

FinnFN 1.0: The Finnish frame semantic database

Author: Haltia Heidi
Laine Antti
Lindén Krister
Luukkonen Juha
Roivainen Hege
Väisänen Niina
Publication venue
Publication date: 14/08/2017
Field of study

The article describes the process of creating a Finnish language FrameNet or FinnFN, based on the original English language FrameNet hosted at the International Computer Science Institute in Berkeley, California. We outline the goals and results relating to the FinnFN project and especially to the creation of the FinnFrame corpus. The main aim of the project was to test the universal applicability of frame semantics by annotating real Finnish using the same frames and annotation conventions as in the original Berkeley FrameNet project. From Finnish newspaper corpora, 40,721 sentences were automatically retrieved and manually annotated as example sentences evoking certain frames. This became the FinnFrame corpus. Applying the Berkeley FrameNet annotation conventions to the Finnish language required some modifications due to Finnish morphology, and a convention for annotating individual morphemes within words was introduced for phenomena such as compounding, comparatives and case endings. Various questions about cultural salience across the two languages arose during the project, but problematic situations occurred only in a few examples, which we also discuss in the article. The article shows that, barring a few minor instances, the universality hypothesis of frames is largely confirmed for languages as different as Finnish and English.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Bibliographic Data Science and the History of the Book (c. 1500–1800)

Author: Hege Roivainen
Jani Marjanen
Leo Lahti
Mikko Tolonen
Publication venue: 'Informa UK Limited'
Publication date: 28/10/2022
Field of study

UTUPub

A Quantitative Approach to Book-Printing in Sweden and Finland, 1640–1828

Author: Hege Roivainen
Jani Marjanen
Leo Lahti
Mikko Tolonen
Publication venue: 'Informa UK Limited'
Publication date: 28/10/2022
Field of study

Several cities in Sweden have been providing book-printing facilities since the 1640s. In our quantitative and explorative analysis of library catalogs from the National Library of Sweden and the National Library of Finland we identify the general trends in publishing, how book-printing has been affected by political events, and how printing developed at different paces in different parts of the realm. We have developed a new method for analyzing the totality of publishing through extensive data harmonization and comprehensive statistical analysis, and by treating library catalogs not as an endpoint of bibliographic research but as an inherently rich source of information. This facilitated the quantitative assessment of printing in the Swedish realm based on the metadata contained in library catalogs. Our data-driven approach to the transformation of public discourse demonstrates that whereas the amount of printed material grew steadily, political ruptures affected the development of printing. We also suggest that the culture of books and printing is best understood through the dynamics of competing intellectual hubs consisting of the university cities and the political center in Stockholm. This perspective further challenges the dominant, nationally delineated approach in book history.</p

UTUPub

A National Public Sphere? Analysing the Language, Location and Form of Newspapers in Finland, 1771–1917

Author: Kanner Antti
Lahti Leo
Marjanen Jani
Mäkelä Eetu
Roivainen Hege
Tolonen Mikko
Vaara Ville
Publication venue
Publication date: 30/06/2019
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Analytical Edition Detection In Bibliographic Metadata

Author: Ijaz Ali
Lahti Leo
Roivainen Hege
Publication venue
Publication date: 11/07/2019
Field of study

Analytical bibliography's aim is to understand books and other printed objects as artifacts and how they were produced. Bibliographic metadata can represent important historical trends and resolve issues such as the ordering of editions. In this paper, we present the state of the art analytical approach for determining editions and their ordering. By providing harmonized data and information on historical developments in book production, this will be a great aid for projects aiming to do large-scale text mining. Contemporary text mining approaches do not utilize edition level information to the fullest extent and therefore are limited in their scope. Using the ESTC metadata, we have developed harmonizing techniques that convert free-form text into more coherent entries for statistical analysis. Furthermore, a new gold standard was developed for validation purposes, with multiple layers of information. The use of this data would significantly enhance the understanding of early modern publishing.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Book Formats and Reading Habits in Early Modern Europe

Author: Marjanen Jani
Roivainen Hege
Publication venue: DataverseNL
Publication date
Field of study

Abstract and poster of paper 0596 presented at the Digital Humanities Conference 2019 (DH2019), Utrecht , the Netherlands 9-12 July, 2019